Kaggle Notebook used as a reference - https://www.kaggle.com/code/dtosidis/flower-classifier-tensorflow
Problem statement
The Oxford Flowers 102 dataset consists of 102 categories of flowers, with each category containing between 40 and 258 images. The dataset is used for image classification tasks where the goal is to correctly classify images of flowers into their respective categories.
The problem statement for this dataset can be defined as follows: given an image of a flower, classify it into one of the 102 categories of flowers. This is a typical image classification problem that is encountered in many real-world applications such as content-based image retrieval systems, plant identification, and many others.
In order to solve this problem, we have used transfer learning and fine-tuning techniques. Transfer learning allows us to leverage the knowledge and weights learned by a pre-trained model, such as MobileNetV2, to solve our classification task. Fine-tuning refers to the process of adapting the pre-trained model to our specific task by training the top layers of the network on our dataset while keeping the lower layers fixed. This allows the network to learn specific features related to our dataset, improving its performance on our task.
This code imports necessary libraries including TensorFlow, NumPy, os, and Matplotlib to build an image classification model using the MobileNetV2 architecture. It also imports the ImageDataGenerator and TensorFlow Datasets (tfds) libraries for data preprocessing and management. The MobileNetV2 architecture is used as the base model, with a global average pooling layer and a dense layer added on top for classification.
pip install tensorflow-datasets
import tensorflow as tf
import numpy as np
import os
import matplotlib.pylab as plt
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2, preprocess_input
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
import tensorflow_datasets as tfds
This code snippet imports the TensorFlow library and prints the version of the installed TensorFlow package. It can be useful for verifying the installed version of TensorFlow before running other code that may depend on a specific version.
import tensorflow as tf
print(tf.__version__)
This code loads the oxford_flowers102 dataset using TensorFlow Datasets (tfds) and splits it into three datasets: training, testing, and validation. The with_info=True argument returns additional information about the dataset, such as the number of classes and the shape of the images. The as_supervised=True argument loads the dataset in a tuple structure, with the first element being the image and the second element being the corresponding label.
# Load the dataset
dataset, info = tfds.load(name='oxford_flowers102', with_info=True, as_supervised=True, split=['train', 'test', 'validation'])
train_dataset, test_dataset, val_dataset = dataset
This code loads the "oxford_flowers102" dataset from TensorFlow Datasets and splits it into a training set ("train_dataset"), a test set ("test_dataset"), and a validation set ("val_dataset").
The second code loads the "oxford_flowers102" dataset from TensorFlow Datasets and splits it into a test set ("train") and obtains information about the dataset ("info_train"). Then, it displays a few examples of the dataset using the tfds.show_examples function.
train, info_train = tfds.load(name='oxford_flowers102', with_info=True, split='test')
tfds.show_examples (info_train, train)
info
This line of code returns the number of samples in the first split of the dataset variable, which was loaded using TensorFlow Datasets. Specifically, it returns the length of the list that contains the images and corresponding labels of the first split.
len (list (dataset[0]))
This code sets the image dimensions to 224x224 pixels and batch size to 32 for the image classification model. It also sets the number of classes based on the information extracted from the dataset.
# Set image dimensions and batch size
img_width, img_height = 224, 224
batch_size = 32
# Set number of classes
num_classes = info.features['label'].num_classes
This code calculates the number of images in the train, test, and validation splits of the Oxford Flowers 102 dataset. It uses the info variable which was obtained by loading the dataset using tfds.load(). The number of examples in each split is accessed using the num_examples attribute of the tf.data.Dataset object. The counts are stored in the train_count, test_count, and val_count variables respectively.
# Calculate the number of images in the train, test, and validation splits
train_count = info.splits['train'].num_examples
test_count = info.splits['test'].num_examples
val_count = info.splits['validation'].num_examples
This code calculates the distribution of classes in the training dataset. It creates an empty dictionary class_dist, and then loops through each image and label in the train_dataset. For each image and label, it gets the class name using the int2str method from the dataset info, and then checks if the class name already exists in class_dist. If it does, it increments the count for that class by 1. If it doesn't, it adds the class name to class_dist and sets its count to 1. At the end of the loop, class_dist contains the number of images for each class in the training dataset.
class_dist = {}
# Loop through each class and count the number of samples
for img, label in train_dataset:
class_name = info.features['label'].int2str(label.numpy())
if class_name in class_dist:
class_dist[class_name] += 1
else:
class_dist[class_name] = 1
This code creates a dictionary to store the distribution of samples across the different classes in the training dataset of the Oxford Flowers 102 dataset. It loops through each image in the dataset and counts the number of samples for each class. Then, it plots a bar chart of the class distribution using Matplotlib, with the class labels on the x-axis and the number of samples on the y-axis. This visualization helps to understand the distribution of samples across the different classes, which is important for training and evaluating a machine learning model on this dataset.
# Create a dictionary to store the distribution of samples across the different classes
class_dist = {}
# Loop through each class and count the number of samples
for img, label in train_dataset:
class_name = info.features['label'].int2str(label.numpy())
if class_name in class_dist:
class_dist[class_name] += 1
else:
class_dist[class_name] = 1
# Plot a bar chart of the class distribution
plt.bar(class_dist.keys(), class_dist.values())
plt.title('Class Distribution')
plt.xlabel('Class Label')
plt.ylabel('Number of Samples')
plt.show()
This code creates two lists to store the widths and heights of images in the train dataset, then loops through each image in the train set and records its size. After that, it plots histograms of the image widths and heights. The first histogram shows the distribution of image widths in pixels, while the second histogram shows the distribution of image heights in pixels. The histograms are plotted using matplotlib.
# Create lists to store the image widths and heights
widths = []
heights = []
# Loop through each image in the train set and record its size
for img, label in train_dataset:
widths.append(img.shape[1])
heights.append(img.shape[0])
# Plot histograms of the image widths and heights
plt.hist(widths, bins=20)
plt.title('Image Width Distribution')
plt.xlabel('Width (pixels)')
plt.ylabel('Number of Images')
plt.show()
plt.hist(heights, bins=20)
plt.title('Image Height Distribution')
plt.xlabel('Height (pixels)')
plt.ylabel('Number of Images')
plt.show()
train_dataset = train_dataset.map(lambda img, label: (tf.image.resize(img, [224, 224]), label))
image_batch, label_batch = next(iter(train_dataset.batch(batch_size)))
This code extracts a random batch of images from the train set, reshapes the images to 1D arrays and normalizes the pixel values. Then, it plots histograms of the pixel intensities for each color channel (red, green, and blue). The histograms show the distribution of pixel values for each color channel, with the x-axis representing the pixel value and the y-axis representing the number of pixels with that value. The bins parameter specifies the number of bins to use in the histogram. The resulting plot gives insight into the distribution of pixel intensities in the dataset.
# Extract a random batch of images from the train set
image_batch, label_batch = next(iter(train_dataset.batch(batch_size)))
# Reshape the images to 1D arrays and normalize the pixel values
image_batch = tf.reshape(image_batch, [batch_size, -1])
image_batch = preprocess_input(image_batch)
# Plot histograms of the pixel intensities for each color channel
plt.hist(image_batch[:, 0], bins=20, alpha=0.5, label='Red')
plt.hist(image_batch[:, 1], bins=20, alpha=0.5, label='Green')
plt.hist(image_batch[:, 2], bins=20, alpha=0.5, label='Blue')
plt.title('Pixel Intensity Distribution')
plt.xlabel('Pixel Value')
plt.ylabel('Number of Pixels')
plt.legend()
plt.show()
This code creates a dictionary label_count to store the count of images for each label in the train dataset. It loops through each image and label in the train_dataset, and if the label is not already in the label_count dictionary, it adds it and sets its count to 1. If the label is already in the dictionary, it increments the count by 1.
# Get label distribution in the train dataset
label_count = {}
for image, label in train_dataset:
if label.numpy() not in label_count:
label_count[label.numpy()] = 1
else:
label_count[label.numpy()] += 1
This code creates a bar graph to show the label distribution in the train dataset. It first counts the number of samples for each label in the train dataset and stores the count in a dictionary called label_count. Then, it uses plt.bar() and plt.xticks() functions to create a bar graph with the x-axis representing the label values and the y-axis representing the corresponding count of samples for each label. The plt.xlabel(), plt.ylabel(), and plt.title() functions are used to set the x-axis label, y-axis label, and graph title, respectively. Finally, plt.show() is used to display the graph.
# Create a bar graph to show label distribution
plt.bar(range(len(label_count)), list(label_count.values()))
plt.xticks(range(len(label_count)), list(label_count.keys()))
plt.xlabel('Label')
plt.ylabel('Count')
plt.title('Label Distribution in Train Dataset')
plt.show()
This code is used to count the distribution of image sizes in the train dataset.
# Get image size distribution in the train dataset
size_count = {}
for image, label in train_dataset:
size = image.numpy().shape
if size not in size_count:
size_count[size] = 1
else:
size_count[size] += 1
This code creates a bar graph to show the distribution of image sizes in the train dataset. It first creates a dictionary size_count to count the number of occurrences of each image size. Then it plots a bar graph with the x-axis representing the image sizes, the y-axis representing the count of images with that size, and the bars showing the count for each size. The x-axis labels are rotated 90 degrees for readability.
# Create a bar graph to show image size distribution
plt.bar(range(len(size_count)), list(size_count.values()))
plt.xticks(range(len(size_count)), [str(size) for size in size_count.keys()], rotation=90)
plt.xlabel('Image Size')
plt.ylabel('Count')
plt.title('Image Size Distribution in Train Dataset')
plt.show()
The code sets up data augmentation and preprocessing using the ImageDataGenerator class from Keras. The preprocessing_function argument specifies a function to apply to each image before any other transformations are applied. The other arguments specify various transformations to apply to the images during training, such as rotation, zooming, shearing, and horizontal flipping. These transformations help to increase the diversity of the training data and improve the robustness of the model.
# Set up data augmentation and preprocessing
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
preprocessing_function=preprocess_input,
rotation_range=20,
zoom_range=0.2,
shear_range=0.2,
horizontal_flip=True
)
This code sets up a ImageDataGenerator for the test set, which will apply the preprocess_input function to each image in the test set. The preprocess_input function is a pre-defined function in the tf.keras.applications module that performs some standard pre-processing steps on the input image, such as scaling the pixel values to be between -1 and 1.
test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(preprocessing_function=preprocess_input)
This code creates generators for the train, validation, and test sets of image data.
Each generator applies the following transformations to the images:
For the train_generator, the images are also shuffled using a buffer size of 1000 before being batched. This allows for randomization of the order in which the images are presented during training, which can help prevent the model from overfitting to the order of the images in the dataset.
# Create generators for train, validation, and test sets
train_generator = train_dataset.map(lambda x, y: (preprocess_input(tf.image.resize(x, (img_width, img_height))), y)).shuffle(1000).batch(batch_size)
val_generator = val_dataset.map(lambda x, y: (preprocess_input(tf.image.resize(x, (img_width, img_height))), y)).batch(batch_size)
test_generator = test_dataset.map(lambda x, y: (preprocess_input(tf.image.resize(x, (img_width, img_height))), y)).batch(batch_size)
This code loads the MobileNetV2 model architecture pre-trained on the ImageNet dataset, but excludes the fully connected top layers that are used for classification. The input shape is set to (img_width, img_height, 3) which is the size and number of color channels for the images in our dataset.
# Load MobileNetV2 model without top layers
base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
This code freezes the weights of all layers in the base_model so that they are not trainable during the training process.
# Freeze base layers
for layer in base_model.layers:
layer.trainable = False
This code adds custom top layers to the MobileNetV2 model. The output of the base model is passed through a global average pooling layer, followed by a fully connected dense layer with 1024 units and ReLU activation. Finally, the output is passed through another dense layer with a number of units equal to the number of classes in the dataset, and sigmoid activation function is applied to get the probability distribution of the classes.
# Add custom top layers
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(num_classes, activation='sigmoid')(x)
This code calculates the number of steps required per epoch and validation step in the training process. tf.data.experimental.cardinality(train_dataset).numpy() returns the total number of samples in the training dataset, which is then divided by the batch size to determine the number of steps required to complete one epoch. The // operator is used to perform integer division, discarding any remainder. Similarly, the validation steps are calculated for the validation dataset. These values are typically used in the training process to ensure that each sample is seen once per epoch and that the validation accuracy is computed at regular intervals.
# calculating the number of steps per epoch and validation steps:
steps_per_epoch = tf.data.experimental.cardinality(train_dataset).numpy() // batch_size
validation_steps = tf.data.experimental.cardinality(val_dataset).numpy() // batch_size
The code compiles the model with the specified optimizer, loss function, and evaluation metric. The model is defined as taking the base model's input and outputting the custom top layers' predictions. The optimizer used is Adam, the loss function is sparse categorical cross-entropy, and the evaluation metric is accuracy.
# Compile the model
model = Model(inputs=base_model.input, outputs=predictions)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
This code trains the compiled model using the fit() method with the following parameters:
train_generator: a generator that provides batches of preprocessed training images and their corresponding labels steps_per_epoch: the number of steps to be performed in each epoch before declaring an epoch finished epochs: the number of times the entire training dataset is passed through the network validation_data: a generator that provides batches of preprocessed validation images and their corresponding labels validation_steps: the number of steps to be performed in each epoch during validation
The training progress is stored in the history object, which contains the values of the loss and accuracy metrics for the training and validation sets at each epoch.
# # Train the model
history = model.fit(
train_generator,
steps_per_epoch=tf.data.experimental.cardinality(train_generator).numpy(),
epochs=10,
validation_data=val_generator,
validation_steps=tf.data.experimental.cardinality(val_generator).numpy()
)
The code is evaluating the trained model on the test set using the evaluate() method provided by Keras. It calculates the test loss and test accuracy of the model on the test data provided by test_generator. The steps parameter in evaluate() specifies the total number of steps (batches) to evaluate before stopping. Finally, the code prints the test accuracy of the model on the test set.
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(test_generator, steps=tf.data.experimental.cardinality(test_generator).numpy())
print('Test accuracy:', test_acc)
This code defines a function to calculate accuracy and uses it to get the training and validation accuracy of a trained model. It also plots the training and validation loss curves as well as the training and validation accuracy curves using Matplotlib.
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
# Define a function to calculate accuracy
def get_accuracy(model, data_generator):
num_correct = 0
num_total = 0
for x_batch, y_batch in data_generator:
y_pred = model.predict(x_batch)
y_pred_classes = np.argmax(y_pred, axis=1)
num_correct += np.sum(y_pred_classes == y_batch)
num_total += len(y_batch)
return num_correct / num_total
# Calculate accuracy on the training and validation sets
train_acc = get_accuracy(model, train_generator)
val_acc = get_accuracy(model, val_generator)
print("Training accuracy:", train_acc)
print("Validation accuracy:", val_acc)
# Plot the training and validation loss curves
plt.plot(history.history['loss'], label='training loss')
plt.plot(history.history['val_loss'], label='validation loss')
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show()
# Plot the training and validation accuracy curves
plt.plot(history.history['accuracy'], label='training accuracy')
plt.plot(history.history['val_accuracy'], label='validation accuracy')
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.show()
This is a function that takes an image and a label as input and returns a scaled and resized image along with a one-hot encoded label. The function first converts the image to float32 type and scales the pixel values between 0 and 1. Then, it resizes the image to 224 x 224 dimensions using bilinear interpolation. Finally, the label is one-hot encoded using tf.one_hot() function with 3 classes.
def scale (image, label):
image = tf.cast (image, tf.float32)
image/= 255.0
return tf.image.resize(image, [224, 224]), tf.one_hot (label, 3)
This code is for iterating over a test dataset and making predictions using a trained model. For each sample in the dataset, it scales the image and converts the label to one-hot encoded format. It then uses the predict() method of the trained model to make predictions on the scaled image. It displays the original image, the actual label, and the predicted label using matplotlib.
for test_sample in dataset [1].take (15):
image, label=test_sample [0], test_sample [1]
image_scaled, label_arr= scale(test_sample [0], test_sample[1])
image_scaled = np.expand_dims (image_scaled, axis=0)
img = tf.keras.preprocessing.image.img_to_array(image)
pred=model.predict(image_scaled)
print (pred)
plt.figure()
plt.imshow(image)
plt.show()
print("Actual Label: %s" % info.features ["label"].names[label.numpy()])
print("Predicted Label: %s" % info.features ["label"].names [np.argmax(pred)])
We have successfully trained a model to classify images of flowers into three categories with an accuracy of 78% on the test set. The model was trained using transfer learning with the MobileNetV2 architecture and achieved good results with only a few epochs of training. However, there is still room for improvement, and further tuning of the hyperparameters and architecture may lead to even better performance. Overall, this project demonstrates the power of deep learning in image classification tasks and its potential applications in various fields such as agriculture, medicine, and environmental monitoring.
Faster training: Because MobileNetV2 is a lightweight architecture, it typically trains faster than larger, more complex models.
Improved accuracy: By fine-tuning the MobileNetV2 architecture on a specific task, the model can potentially achieve better accuracy than a model trained from scratch.
Reduced need for large amounts of data: Because MobileNetV2 is pretrained on a large dataset, it has already learned a rich set of features that can be leveraged for a new task with less data.
Transfer learning: Fine-tuning a pretrained MobileNetV2 model can be an effective form of transfer learning, which allows the model to apply knowledge learned from one task to a new, related task. This can be especially useful when working with limited data.
Dataset Link - https://www.robots.ox.ac.uk/~vgg/data/flowers/102/
MobileNet - https://keras.io/api/applications/mobilenet/
Fine-tuning - https://towardsdatascience.com/fine-tuning-for-domain-adaptation-in-nlp-c47def356fd6
Use of Activation Function - https://www.analyticsvidhya.com/blog/2020/01/fundamentals-deep-learning-activation-functions-when-to-use-them/
Kaggle Notebook used as a reference - https://www.kaggle.com/code/dtosidis/flower-classifier-tensorflow